Nonlinear random matrix theory for deep learning Nonlinear random matrix theory for deep learning

نویسندگان

  • Jeffrey Pennington
  • Pratik Worah
چکیده

Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise non-linearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix Y Y , Y = f(WX), where W is a random weight matrix, X is a random data matrix, and f is a point-wise non-linear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of single-layer random feature networks on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonlinear random matrix theory for deep learning

Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obst...

متن کامل

Deep Semi-Random Features for Nonlinear Function Approximation

We propose semi-random features for nonlinear function approximation. The flexibility of semirandom feature lies between the fully adjustable units in deep learning and the random features used in kernel methods. For one hidden layer models with semi-random features, we prove with no unrealistic assumptions that the model classes contain an arbitrarily good function as the width increases (univ...

متن کامل

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output ma...

متن کامل

Learning Deep Representations By Distributed Random Samplings

In this paper, we propose an extremely simple deep model for the unsupervised nonlinear dimensionality reduction – deep distributed random samplings. First, its network structure is novel: each layer of the network is a group of mutually independent k-centers clusterings. Second, its learning method is extremely simple: the k centers of each clustering are only k randomly selected examples from...

متن کامل

Structured Image Classification from Conditional Random Field with Deep Class Embedding

This paper presents a novel deep learning architecture to classify structured objects in datasets with a large number of visually similar categories. Our model extends the CRF objective function to a nonlinear form, by factorizing the pairwise potential matrix, to learn neighboring-class embedding. The embedding and the classifier are jointly trained to optimize this highly nonlinear CRF object...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017